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Amendments to the Claims: 

This listing of claims will replace all prior versions and listings of claims in the application: 
Listing of Claims: 

1 . (currently amended) A method of executing a plurality of threads within a 
single programmable processor, the method comprising: 

storing a plurality of data elements in partitioned fields of at least one register 
having a register width, each of the data elements having an elemental width smaller than the 
register width; 

receiving an instruction stream for each one of the plurality of threads at an 
execution unit; mi 

executing instructions from each instruction stream received at the execution unit 
in a multistage pipeline within th e e x e cution unit such that, at any a given time, the multistage 
pipeline includes instructions from different ones of the instruction streams in different stages of 
the multistage pipeline, the instructions including a single instruction that specifies an operation 
to cause multiple instances of the operation to be performed, each instance of the operation to be . 
performed using a different one of the operat e s on a plurality of data elements in partitioned 
fields of the at least one register to produce a catenated resul t, th e at least on e regist e r having a 
regist e r width and each of the data elem e nts having on e l e m e ntal width small e r than th e r e gister 

m.t.Ti /4<-V% 

2. (original) The method of claim 1 wherein the number of threads executing 
within the execution unit is prime relative to a rate of execution of a slowest functional unit in 
the execution unit. 

3. (original) The method of claim 1 wherein the instructions from the plurality of 
instruction streams are executed in a round-robin manner. 
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4. (currently amended) The method of claim 1 wherein only one thread from the 
plurality of threads can handle an exception at aay a given time. 

5. (original) The method of claim 1 further comprising: 

decoding a second single instruction specifying a third and a fourth register each 
containing a plurality of floating-point operands; 

multiplying the plurality of floating point operands in the third register by the 
plurality of operands in the fourth register to produce a plurality or products; and 

providing the plurality of products to partitioned fields of a result register as a 
catenated result. 

6. (currently amended) A computer-readable storage medium: 

having an instruction stream for each one of a plurality of threads that instruct a 
computer system to perform operations comprising, 

storing a plurality of data elements in partitioned fields of at least one reeister 
having a register width, each of the data elements having an elemental width smaller than the 
register width; 

receiving an instruction stream for each one of the plurality of threads at an 

execution unit; 

executing instructions from each instruction stream received at the execution unit 
in a multistage pipeline within th e execution unit such that, at afty a given time, the multistage 
pipeline includes instructions fi^om different ones of the instruction streams in different stages of 
the multistage pipeline, the instructions including a single instruction that specifies an operation 
to cause multiple instances of the operation to be performed, each instance of the operation to be 
performed using a different one of the op e rat e s on a plurality of data elements in partitioned 
fields of the at least one register to produce a catenated result , th e at l e ast on e regist e r having a 
r e gister width and e ach of the data e l e m e nts having on e l e m e ntal width smaller than th e r e gist e r 
width . 
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7. (previously presented) The computer-readable storage medium of claim 6 
wherein the number of threads executing within the execution unit is prime relative to a rate of 
execution of a slowest functional unit in the execution unit. 

8. (previously presented) The computer-readable storage medium of claim 6 
wherein the instructions from the plurality of instruction streams are executed in a round-robin 
manner. 

9. (currently amended) The computer-readable storage medium of claim 6 
wherein only one thread from the plurality of threads can handle an exception at any a given 
time. 

10. -14. (canceled) 

15. (new) The computer-readable storage medium of claim 6 wherein the 
computer system is to perform operations further comprising: 

decoding a second single instruction specifying a third and a fourth register each 
containing a plurality of floating-point operands; 

multiplying the plurality of floating point operands in the third register by the 
plurality of operands in the fourth register to produce a plurality or products; and 

providing the plurality of products to partitioned fields of a result register as a 
catenated result. 
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